-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Resolve duplicate key exception in GetDatafeedRunningStateAction #125477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement merge function for duplicate datafeed states when a datafeed is force-stopped and restarted before cancellation completes. Select the most appropriate state based on: 1. Prefer state with more recent searchInterval.startMs when both exist 2. Prefer states with searchInterval over those without 3. Default to second state when all criteria are equal
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Pinging @elastic/ml-core (Team:ML) |
|
Thanks for the contribution @hye-on and welcome to Elasticsearch
The logic you've applied makes perfect sense to me, this is a practical solution given that Please can you add a unit test to cover the logic in the |
|
@elasticmachine test this please |
|
@davidkyle I’ll add the test! I’ll reach out if I need any help. Thank you for the review! :) |
|
@davidkyle I’ve added the tests! Thank you! |
|
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the contribution
This PR fixes issue #104160 where a duplicate key exception occurs in
GetDatafeedRunningStateAction.Response.fromResponses(). The issue happens when a datafeed is force-stopped and restarted before its local task cancellation completes. This creates a situation where two local tasks for the same datafeed temporarily coexist on the ML node (one cancelling, one starting), causing the duplicate key error when both report their state.The solution implements a merge function in the toMap collector that selects the most appropriate state when duplicates are found, based on the
searchIntervaldata.The solution implements a merge function in the toMap collector that selects the most appropriate state when duplicates are found, based on the searchInterval data.
Select the most appropriate state based on:
Comment
I'm new to Elasticsearch and open source contributions in general. I went with searchInterval.startMs as the selection criteria, but I'd appreciate any feedback on whether there might be a better approach for handling these duplicate states. Thank you for your guidance! :)
Fixes #104160